Using random subspace method for prediction and variable importance assessment in linear regression

نویسندگان

  • Jan Mielniczuk
  • Pawel Teisseyre
چکیده

A randomsubsetmethod (RSM)with a newweighting scheme is proposed and investigated for linear regression with a large number of features. Weights of variables are defined as averages of squared values of pertaining t-statistics over fitted models with randomly chosen features. It is argued that such weighting is advisable as it incorporates two factors: a measure of importance of the variable within the considered model and a measure of goodness-of-fit of the model itself. Asymptotic weights assigned by such a scheme are determined as well as assumptions under which the method leads to consistent choice of significant variables in the model. Numerical experiments indicate that the proposed method behaves promisingly when its prediction errors are compared with errors of penalty-based methods such as the lasso and it has much smaller false discovery rate than the other methods considered. © 2012 Elsevier B.V. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extensions to Quantile Regression Forests for Very High-Dimensional Data

This paper describes new extensions to the state-of-the-art regression random forests Quantile Regression Forests (QRF) for applications to high dimensional data with thousands of features. We propose a new subspace sampling method that randomly samples a subset of features from two separate feature sets, one containing important features and the other one containing less important features. Th...

متن کامل

Comparison of artificial neural network with logistic regression in prediction of tendency to surgical intervention in nurses

Introduction: Logistic regression is one of the modeling methods for bipartite dependent variables. On the other hand, artificial neural network is a flexible method with the least limitation. The importance of growing unnecessary beauty surgeries and the importance of prediction and classification made us consider the present study, with the aim of comparing logistic regression and artificial ...

متن کامل

Application of Linear Regression and Artificial NeuralNetwork for Broiler Chicken Growth Performance Prediction

This study was conducted to investigate the prediction of growth performance using linear regression and artificial neural network (ANN) in broiler chicken. Artificial neural networks (ANNs) are powerful tools for modeling systems in a wide range of applications. The ANN model with a back propagation algorithm successfully learned the relationship between the inputs of metabolizable energy (kca...

متن کامل

Support vector regression with random output variable and probabilistic constraints

Support Vector Regression (SVR) solves regression problems based on the concept of Support Vector Machine (SVM). In this paper, a new model of SVR with probabilistic constraints is proposed that any of output data and bias are considered the random variables with uniform probability functions. Using the new proposed method, the optimal hyperplane regression can be obtained by solving a quadrati...

متن کامل

Variable Importance Assessment in Regression: Linear Regression versus Random Forest

Relative importance of regressor variables is an old topic that still awaits a satisfactory solution. When interest is in attributing importance in linear regression, averaging over orderings methods for decomposing R2 are among the state-of-theart methods, although the mechanism behind their behavior is not (yet) completely understood. Random forests—a machinelearning tool for classification a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 71  شماره 

صفحات  -

تاریخ انتشار 2014